Spontaneous speech recognition using a massively parallel decoder

نویسندگان

  • Takahiro Shinozaki
  • Sadaoki Furui
چکیده

Since spontaneous utterances include many variations, speakerand task-independent general models do not work well. This paper proposes combining cluster-based language and acoustic models based on the framework of Massively Parallel Decoder (MPD). The MPD is a parallel decoder that has a large number of decoding units, in which each unit is assigned to each combination of element models. It runs efficiently on a parallel computer, and thus the turnaround time is comparable to conventional decoders using a single model and a processor. In the experiments conducted using lecture speeches from the Corpus of Spontaneous Japanese, two types of cluster models have been investigated: lecture-based cluster models and utterancebased cluster models. It has been confirmed that utterancebased cluster models give significantly lower recognition error rate than lecture-based cluster models in both language and acoustic modeling. It has also been shown that roughly 100 decoding units are enough in terms of recognition rate, and in the best setting, 12% reduction in word error rate was obtained in comparison with the conventional decoder.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Speech Summarization using Weighte

This paper proposes an integrated framework to summarize spontaneous speech into written-style compact sentences. Most current speech recognition systems attempt to transcribe whole spoken words correctly. However, recognition results of spontaneous speech are usually difficult to understand, even if the recognition is perfect, because spontaneous speech includes redundant information, and its ...

متن کامل

Selected topics from 40 years of research on speech and speaker recognition

This paper summarizes my 40 years of research on speech and speaker recognition, focusing on selected topics that I have investigated at NTT Laboratories, Bell Laboratories and Tokyo Institute of Technology with my colleagues and students. These topics include: the importance of spectral dynamics in speech perception; speaker recognition methods using statistical features, cepstral features, an...

متن کامل

An optimized multi-duration HMM for spontaneous speech recognition

In spontaneous speech, various speech style and speed changes can be observed, which are known to degrade speech recognition accuracy. In this paper, we describe an optimized multi-duration HMM (OMD). An OMD is a kind of multi-path HMM with at most two parallel paths. Each path is trained using speech samples with short or long phoneme duration. The thresholds to divide samples of phonemes are ...

متن کامل

An Assessment of Automatic Recognition Techniques for Spontaneous Speech in Comparison with Human Performance

To investigate problems of spontaneous speech recognition using N-grams and HMMs and estimate the room for improvement in the recognition rate, an automatic speech recognizer is evaluated in comparison with performances by human listeners. The evaluation task is to recognize spontaneous speech presentations from the Corpus of Spontaneous Japanese. Both the automatic recognizer and human listene...

متن کامل

Articulatory Features and Associated Production Models in Statistical Speech Recognition

A statistical approach to speech recognition is outlined which draws close parallel with closed-loop human speech communication schematized as a joint process of encoding and decoding of linguistic messages. The encoder consists of the symbolically-valued overlapping articulatory feature model and of its interface to a nonlinear task-dynamic model of speech production. A general speech recogniz...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2004